Generating abbreviations using Google Books library
نویسندگان
چکیده
The article describes the original method of creating a dictionary of abbreviations based on the Google Books Ngram Corpus. The dictionary of abbreviations is designed for Russian, yet as its methodology is universal it can be applied to any language. The dictionary can be used to define the function of the period during text segmentation in various applied systems of text processing. The article describes difficulties encountered in the process of its construction as well as the ways to overcome them. A model of evaluating a probability of first and second type errors (extraction accuracy and fullness) is constructed. Certain statistical data for the use of abbreviations are provided.
منابع مشابه
An Analytical Study of Online Public Access Catalogues in Comparison with Features of Amazon and Google: A Checklist Approach
Recent researches in the field of cataloguingconfirmed that the library catalogue is losing its importance and its service cannot match the level of Google, Google Scholar, Google Books and Amazon. Ultimately it made the library users to bypass the library catalogue for their information requisites. The scenario asserts the need of major shifts in cataloguing in various aspects and compels to e...
متن کاملExploring the Nature of the Smart Cities Research Landscape
As a research domain, Smart Cities is only emerging. This is evident from the number of publications, books, and other scholarly articles on smart cities indexed in Google scholar and Elsevier’s Scopus—an abstract and citation database. However, significant literature is available on related topics like intelligent city, digital city, and intelligent community based on search results research r...
متن کاملPeachnote: Music Score Search and Analysis Platform
Hundreds of thousands of music scores are being digitized by libraries all over the world. In contrast to books, they generally remain inaccessible for content-based retrieval and algorithmic analysis. There is no analogue to Google Books for music scores, and there exist no large corpora of symbolic music data that would empower musicology in the way large text corpora are empowering computati...
متن کاملGenerating a dictionary of control models for event extraction
A subordination dictionary is important in a number of text processing applications. We present a method for generating such dictionary for Russian verbs using Google Books Ngram data. An intended purpose of the dictionary is an event extraction system for Russian that uses the dictionary to define extraction patterns.
متن کاملAlternative Metrics for Book Impact Assessment: Can Choice Reviews be a Useful Source?
This article assesses whether academic reviews in Choice: Current Reviews for Academic Libraries could be systematically used for indicators of scholarly impact, uptake or educational value for scholarly books. Based on 451 Choice book reviews from 2011 across the humanities, social sciences and science, there were significant but low correlations between Choice ratings and citation and non-cit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1410.1080 شماره
صفحات -
تاریخ انتشار 2014